Skip to content

Conversation

@Andy-Jost
Copy link
Contributor

@Andy-Jost Andy-Jost commented Jan 22, 2026

Summary

Converts _module.py to Cython (_module.pyx) for improved performance, adding RAII-based resource handle management for CUkernel and CUlibrary driver objects.

  • Converts Kernel, ObjectCode, KernelOccupancy, and KernelAttributes to cdef class
  • Adds LibraryHandle and KernelHandle to the resource_handles C++ infrastructure
  • Replaces Python-level driver API calls with cydriver calls wrapped in nogil blocks
  • Adds .pxd file for cross-module cimport support

Changes

  • _module.py_module.pyx with cdef class definitions
  • New _module.pxd with typed attribute and method declarations
  • Extended _cpp/resource_handles.{hpp,cpp} with library/kernel handle types
  • Updated _resource_handles.{pxd,pyx} with new handle functions
  • Updated _launcher.pyx to directly access kernel handles via cimport
  • Minor updates to _linker.py, _program.py to use new factory methods
  • Test updates for API changes

Test plan

  • All existing test_module.py tests pass
  • All existing test_program.py tests pass
  • CI passes

Convert Kernel, ObjectCode, and KernelOccupancy to cdef classes with
proper .pxd declarations. This phase establishes the Cython structure
while maintaining Python driver module usage.

Changes:
- Rename _module.py to _module.pyx
- Create _module.pxd with cdef class declarations
- Convert Kernel, ObjectCode, KernelOccupancy to cdef class
- Remove _backend dict in favor of direct driver calls
- Add _init_py() Python-accessible factory for ObjectCode
- Update _program.py and _linker.py to use _init_py()
- Fix test to handle cdef class property descriptors

Phase 2b will convert driver calls to cydriver with nogil blocks.
Phase 2c will add RAII handles to resource_handles.
- Use strong types in .pxd (ObjectCode, KernelOccupancy)
- Remove cdef public - attributes now private to C level
- Add Kernel.handle property for external access
- Add ObjectCode.symbol_mapping property (symmetric with input)
- Update _launcher.pyx, _linker.py, tests to use public APIs
- Module globals: _inited, _py_major_ver, _py_minor_ver, _driver_ver,
  _kernel_ctypes, _paraminfo_supported -> cdef typed
- Module functions: _lazy_init, _get_py_major_ver, _get_py_minor_ver,
  _get_driver_ver, _get_kernel_ctypes, _is_paraminfo_supported,
  _make_dummy_library_handle -> cdef inline with exception specs
- Module constant: _supported_code_type -> cdef tuple
- Kernel._get_arguments_info -> cdef tuple

Note: KernelAttributes remains a regular Python class due to
segfaults when converted to cdef class (likely due to weakref
interaction with cdef class properties).
Follow the _MemPoolAttributes pattern:
- cdef class with inline cdef attributes (_kernel_weakref, _cache)
- _init as @classmethod (not @staticmethod cdef)
- _get_cached_attribute and _resolve_device_id use except? -1
- Explicit cast when dereferencing weakref
Extends the RAII handle system to support CUlibrary and CUkernel driver
objects used in _module.pyx. This provides automatic lifetime management
and proper cleanup for library and kernel handles.

Changes:
- Add LibraryHandle/KernelHandle types with factory functions
- Update Kernel, ObjectCode, KernelOccupancy to use typed handles
- Move KernelAttributes cdef block to .pxd for strong typing
- Update _launcher.pyx to access kernel handle directly via cdef
Replaces Python-level driver API calls with low-level cydriver calls
wrapped in nogil blocks for improved performance. This allows the GIL
to be released during CUDA driver operations.

Changes:
- cuDriverGetVersion, cuKernelGetAttribute, cuKernelGetParamInfo
- cuOccupancy* functions (with appropriate GIL handling for callbacks)
- cuKernelGetLibrary
- Update KernelAttributes._get_cached_attribute to use cydriver types
@Andy-Jost Andy-Jost added enhancement Any code-related improvements cuda.core Everything related to the cuda.core module labels Jan 22, 2026
@Andy-Jost Andy-Jost self-assigned this Jan 22, 2026
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Jan 22, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Andy-Jost
Copy link
Contributor Author

/ok to test 82d92c9

@github-actions
Copy link

Remove type annotation from handle parameter to prevent Cython's
automatic float-to-int coercion, which caused a segmentation fault.
The manual isinstance check properly validates all non-int types.
Four tests in test_utils.py relied on CuPy implicitly creating a CUDA
context but failed when pytest-randomly ordered them after tests using
the init_cuda fixture, which pops the context on cleanup.
@Andy-Jost
Copy link
Contributor Author

/ok to test fde13ae

- Change ObjectCode._init from cdef to @classmethod def, matching the
  pattern used by Buffer, Stream, Graph, and other classes
- Remove _init_py wrapper (no longer needed)
- Update callers in _program.py and _linker.py
- Add test_kernel_keeps_library_alive to verify that a Kernel keeps its
  underlying library alive after ObjectCode goes out of scope
@Andy-Jost
Copy link
Contributor Author

/ok to test e9f2275

@Andy-Jost Andy-Jost marked this pull request as ready for review January 22, 2026 19:25
- Remove Kernel._module (ObjectCode reference no longer needed since
  KernelHandle keeps library alive via LibraryHandle dependency)
- Simplify Kernel._from_obj signature (remove unused ObjectCode param)
- Replace weakref patterns with direct handle storage:
  - KernelAttributes: store KernelHandle instead of weakref to Kernel
  - _MemPoolAttributes: store MemoryPoolHandle instead of weakref to _MemPool
- Rename get_kernel_from_library to create_kernel_handle for consistency
- Remove fragile annotation introspection from test_saxpy_arguments
- Update test_mempool_attributes_ownership to reflect new ownership semantics
@Andy-Jost
Copy link
Contributor Author

/ok to test 46056e6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module enhancement Any code-related improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant